Proper Nouns Recognition in Arabic Crime Text Using Machine Learning Approach
نویسنده
چکیده
Named Entity Recognition (NER) identifies proper nouns in a text and categorizes it as a distinct kind of named entities. This function enables the extraction of peoples name, locations, organizations, and currencies. Several research abound in this area in Arabic NER is concerned. However, recognizing Arabic named entities is challenging due to the complexity in the Arabic language. These complexities are represented by non-existence of capitalization feature which facilitates the process of NER. Furthermore, there is a lack of lexical corpora that may include all the Arabic NEs. On other hand, most of the approaches that have been proposed for Arabic NER were based on handcrafted rule-based methods which can be laborious and time consuming. Therefore, this paper presents our attempt at recognizing and extracting the most important named entities, such as names of persons, locations, organizations, crime types, dates and times in Arabic crime documents using the Decision Tree classifier and feature extraction for crime dataset. The dataset consists of varying data sizes collected from online resources and undergone multiple pre-processing tasks. Additionally, the feature extraction task which includes POS tagging, keyword trigger, definite articles and affixes has been performed. Furthermore the classifier will utilize these features in order to classify the named entities. The results demonstrate that the use of the Decision Tree (DT) yields good results in small size datasets, but failed when the dataset is large in the case of the crime domain. This is due to the difficulty of relevant keywords and features in such fields. The best result for this experiment is 81.35% F-measure.
منابع مشابه
Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media
Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...
متن کاملروشی جدید جهت استخراج موجودیتهای اسمی در عربی کلاسیک
In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers due to a significant impact on improving other NLP tasks such as Machine translation, Information retrieval, question answering, query result...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کامل